Logistic Regression

Resources:

Logistic Regression Tutorial for Machine Learning

Logistic Regression for Machine Learning

How To Implement Logistic Regression With Stochastic Gradient Descent From Scratch With Python

Logistic regression is the go-to linear classification algorithm for two-class problems. It is easy to implement, easy to understand and gets great results on a wide variety of problems, even when the expectations the method has for your data are violated.

Description

Logistic Regression

Logistic regression is named for the function used at the core of the method, the logistic function.

The logistic function, also called the Sigmoid function was developed by statisticians to describe properties of population growth in ecology, rising quickly and maxing out at the carrying capacity of the environment. It’s an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1, but never exactly at those limits.

$$\frac{1}{1 + e^{-x}}$$

$e$ is the base of the natural logarithms and $x$ is value that you want to transform via the logistic function.



In [1]:

    
import numpy as np
import matplotlib.pyplot as plt
import seaborn
%matplotlib inline



In [2]:

    
x = np.linspace(-6, 6, num = 1000)
plt.figure(figsize = (12,8))
plt.plot(x, 1 / (1 + np.exp(-x))); # Sigmoid Function
plt.title("Sigmoid Function");

The logistic regression equation has a very simiar representation like linear regression. The difference is that the output value being modelled is binary in nature.

$$\hat{y}=\frac{e^{\beta_0+\beta_1x_1}}{1+\beta_0+\beta_1x_1}$$

$$\hat{y}=\frac{1.0}{1.0+e^{-\beta_0-\beta_1x_1}}$$

$\beta_0$ is the intecept term

$\beta_1$ is the coefficient for $x_1$

$\hat{y}$ is the predicted output with real value between 0 and 1. To convert this to binary output of 0 or 1, this would either need to be rounded to an integer value or a cutoff point be provided to specify the class segregation point.



In [3]:

    
tmp = [0, 0.4, 0.6, 0.8, 1.0]



In [4]:

    
tmp









    Out[4]:





[0, 0.4, 0.6, 0.8, 1.0]



In [5]:

    
np.round(tmp)









    Out[5]:





array([ 0.,  0.,  1.,  1.,  1.])



In [6]:

    
np.array(tmp) > 0.7









    Out[6]:





array([False, False, False,  True,  True], dtype=bool)

Making Predictions with Logistic Regression

$$\hat{y}=\frac{1.0}{1.0+e^{-\beta_0-\beta_1x_i}}$$

$\beta_0$ is the intecept term

$\beta_1$ is the coefficient for $x_i$



In [7]:

    
dataset = [[-2.0011, 0],
           [-1.4654, 0],
           [0.0965, 0],
           [1.3881, 0],
           [3.0641, 0],
           [7.6275, 1],
           [5.3324, 1],
           [6.9225, 1],
           [8.6754, 1],
           [7.6737, 1]]

Let's say you have been provided with the coefficient



In [8]:

    
coef = [-0.806605464, 0.2573316]



In [9]:

    
for row in dataset:
    yhat = 1.0 / (1.0 + np.exp(- coef[0] - coef[1] * row[0]))
    print("yhat {0:.4f}, yhat {1}".format(yhat, round(yhat)))









    



yhat 0.2106, yhat 0.0
yhat 0.2344, yhat 0.0
yhat 0.3139, yhat 0.0
yhat 0.3895, yhat 0.0
yhat 0.4955, yhat 0.0
yhat 0.7606, yhat 1.0
yhat 0.6377, yhat 1.0
yhat 0.7261, yhat 1.0
yhat 0.8063, yhat 1.0
yhat 0.7628, yhat 1.0

Learning the Logistic Regression Model

The coefficients (Beta values b) of the logistic regression algorithm must be estimated from your training data. This is done using maximum-likelihood estimation.

Maximum-likelihood estimation is a common learning algorithm used by a variety of machine learning algorithms, although it does make assumptions about the distribution of your data (more on this when we talk about preparing your data).

The best coefficients would result in a model that would predict a value very close to 1 (e.g. male) for the default class and a value very close to 0 (e.g. female) for the other class. The intuition for maximum-likelihood for logistic regression is that a search procedure seeks values for the coefficients (Beta values) that minimize the error in the probabilities predicted by the model to those in the data (e.g. probability of 1 if the data is the primary class).

We are not going to go into the math of maximum likelihood. It is enough to say that a minimization algorithm is used to optimize the best values for the coefficients for your training data. This is often implemented in practice using efficient numerical optimization algorithm (like the Quasi-newton method).

When you are learning logistic, you can implement it yourself from scratch using the much simpler gradient descent algorithm.

Learning with Stochastic Gradient Descent

Logistic Regression uses gradient descent to update the coefficients.

Each gradient descent iteration, the coefficients are updated using the equation:

$$\beta=\beta+\textrm{learning rate}\times (y-\hat{y}) \times \hat{y} \times (1-\hat{y}) \times x $$

Using Scikit Learn to Estimate Coefficients



In [10]:

    
from sklearn.linear_model import LogisticRegression



In [11]:

    
dataset









    Out[11]:





[[-2.0011, 0],
 [-1.4654, 0],
 [0.0965, 0],
 [1.3881, 0],
 [3.0641, 0],
 [7.6275, 1],
 [5.3324, 1],
 [6.9225, 1],
 [8.6754, 1],
 [7.6737, 1]]



In [13]:

    
X = np.array(dataset)[:, 0:1]
y = np.array(dataset)[:, 1]



In [14]:

    
X









    Out[14]:





array([[-2.0011],
       [-1.4654],
       [ 0.0965],
       [ 1.3881],
       [ 3.0641],
       [ 7.6275],
       [ 5.3324],
       [ 6.9225],
       [ 8.6754],
       [ 7.6737]])



In [15]:

    
y









    Out[15]:





array([ 0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.])



In [22]:

    
clf_LR = LogisticRegression(C=1.0, penalty='l2', tol=0.0001)



In [23]:

    
clf_LR.fit(X,y)









    Out[23]:





LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)



In [24]:

    
clf_LR.predict(X)









    Out[24]:





array([ 0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  1.])



In [25]:

    
clf_LR.predict_proba(X)









    Out[25]:





array([[ 0.89565647,  0.10434353],
       [ 0.86688737,  0.13311263],
       [ 0.74432098,  0.25567902],
       [ 0.59934245,  0.40065755],
       [ 0.38668774,  0.61331226],
       [ 0.0565882 ,  0.9434118 ],
       [ 0.16375204,  0.83624796],
       [ 0.07941859,  0.92058141],
       [ 0.03376776,  0.96623224],
       [ 0.05533009,  0.94466991]])

Further Resources

A comparison of numerical optimizers for logistic regression

PDF: A comparison of numerical optimizers for logistic regression